AITopics | continuous reinforcement learning

Collaborating Authors

continuous reinforcement learning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Policy Optimization for Continuous Reinforcement Learning

Neural Information Processing SystemsDec-24-2025, 09:18:32 GMT

We study reinforcement learning (RL) in the setting of continuous time and space, for an infinite horizon with a discounted objective and the underlying dynamics driven by a stochastic differential equation. Built upon recent advances in the continuous approach to RL, we develop a notion of occupation time (specifically for a discounted objective), and show how it can be effectively used to derive performance difference and local approximation formulas. We further extend these results to illustrate their applications in the PG (policy gradient) and TRPO/PPO (trust region policy optimization/ proximal policy optimization) methods, which have been familiar and powerful tools in the discrete RL setting but under-developed in continuous RL. Through numerical experiments, we demonstrate the effectiveness and advantages of our approach.

continuous reinforcement learning, name change, policy optimization, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.32)

Add feedback

Policy Optimization for Continuous Reinforcement Learning

Neural Information Processing SystemsOct-10-2024, 19:29:13 GMT

continuous reinforcement learning, policy optimization

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)

Add feedback

Policy Optimization for Continuous Reinforcement Learning

Zhao, Hanyang, Tang, Wenpin, Yao, David D.

arXiv.org Artificial IntelligenceOct-18-2023

We study reinforcement learning (RL) in the setting of continuous time and space, for an infinite horizon with a discounted objective and the underlying dynamics driven by a stochastic differential equation. Built upon recent advances in the continuous approach to RL, we develop a notion of occupation time (specifically for a discounted objective), and show how it can be effectively used to derive performance-difference and local-approximation formulas. We further extend these results to illustrate their applications in the PG (policy gradient) and TRPO/PPO (trust region policy optimization/ proximal policy optimization) methods, which have been familiar and powerful tools in the discrete RL setting but under-developed in continuous RL. Through numerical experiments, we demonstrate the effectiveness and advantages of our approach.

continuous reinforcement learning, policy optimization

arXiv.org Artificial Intelligence

2305.18901

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.60)

Add feedback

Adapting Double Q-Learning for Continuous Reinforcement Learning

Kuznetsov, Arsenii

arXiv.org Artificial IntelligenceSep-25-2023

Majority of off-policy reinforcement learning algorithms use overestimation bias control techniques. Most of these techniques rooted in heuristics, primarily addressing the consequences of overestimation rather than its fundamental origins. In this work we present a novel approach to the bias correction, similar in spirit to Double Q-Learning. We propose using a policy in form of a mixture with two components. Each policy component is maximized and assessed by separate networks, which removes any basis for the overestimation bias. Our approach shows promising near-SOTA results on a small set of MuJoCo environments.

algorithm, overestimation bias, q-network, (11 more...)

arXiv.org Artificial Intelligence

2309.14471

Country:

Asia > Georgia > Tbilisi > Tbilisi (0.04)
Africa > Ethiopia > Addis Ababa > Addis Ababa (0.04)

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Deep Hedging: Continuous Reinforcement Learning for Hedging of General Portfolios across Multiple Risk Aversions

Murray, Phillip, Wood, Ben, Buehler, Hans, Wiese, Magnus, Pakkanen, Mikko S.

arXiv.org Machine LearningJul-15-2022

We present a method for finding optimal hedging policies for arbitrary However, since the idealized assumptions of a complete market initial portfolios and market states. We develop a novel actorcritic do not apply in real markets, it is not surprising that complete market algorithm for solving general risk-averse stochastic control models require constant manual adjustments and oversight, for problems and use it to learn hedging strategies across multiple risk example adjusting delta by "skew delta", smoothing barriers priced aversion levels simultaneously. We demonstrate the effectiveness with local volatility, and taking into account market impact when of the approach with a numerical example in a stochastic volatility trading vega to hedge auto-callable products.

machine learning, portfolio, reinforcement learning, (15 more...)

arXiv.org Machine Learning

2207.07467

Country:

Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Germany > Rhineland-Palatinate > Kaiserslautern (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
North America > United States (0.04)

Genre: Research Report (0.64)

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback